The candidate gene approach.

Alcoholism has a significant genetic basis, and identifying genes that confer a susceptibility to alcoholism will aid clinicians in preventing and effectively treating the disease. One commonly used technique to identify genetic risk factors for complex disorders such as alcoholism is the candidate gene approach, which directly tests the effects of genetic variants of a potentially contributing gene in an association study. These studies, which may include members of an affected family or unrelated cases and controls, can be performed relatively quickly and inexpensively and may allow identification of genes with small effects. However, the candidate gene approach is limited by how much is known of the biology of the disease being investigated. As researchers identify potential candidate genes using animal studies or linking them to DNA regions implicated through other analyses, the candidate gene approach will continue to be commonly used.


Alcoholism has a significant genetic basis, and identifying genes that confer a susceptibility to alcoholism will aid clinicians in preventing and effectively treating the disease. One commonly used technique to identify genetic risk factors for complex disorders such as alcoholism is the candidate gene approach, which directly tests the effects of genetic variants of a potentially contributing gene in an association study. These studies, which may include members of an affected family or unrelated cases and controls, can be performed relatively quickly and inexpensively and may allow identification of genes with small effects. However, the candidate gene approach is limited by how much is known of the biology of the disease being investigated. As researchers identify potential candidate genes using animal studies or linking them to DNA regions implicated through other analyses, the candidate gene approach will continue to be commonly used. KE Y W O R D S: genetic theory of AODU (AOD [alcohol or other drug] use, abuse,
and dependence); genetic linkage; genetic polymorphism; nucleotides; apolipoproteins; quantitative trait locus; alcohol dehydrogenases; aldehyde dehydrogenases; Alzheimer's disease F a m i l y, twin, and adoption studies h a ve indicated that alcoholism has a strong genetic component ( Reich et al. 1999). Although re s e a rc h e r s a re still investigating its exact nature , s e veral genes of va rying effect that confer a susceptibility to alcoholism (i.e., susceptibility genes) likely play a ro l e . Identification of these genes will aid re s e a rchers and clinicians in pre ve n t i n g and effectively treating this disord e r.
The search for alcoholism susceptibility genes centers on two major techniques, linkage mapping and the candidate gene approach. Linkage mapping, also called positional cloning, is the process of systematically scanning the entire DNA contents (i.e., the genomes) of various members of families affected by the disorder using regularly spaced, highly variable (i.e., polymorphic) DNA segments whose exact position is know n (i.e., genetic markers). Using those families, investigators can identify genetic regions associated or "in linkage" with the disease by observing that affected family members share certain mark e r variants (i.e., alleles) located in those regions more frequently than would be expected by chance. These regions can then be isolated, or cloned, for furt h e r analysis and characterization of the responsible genes. Linkage mapping techniques have already resulted in the identification of several potential DNA regions that may contain susceptibility genes for alcoholism (Reich et al. 1 9 9 9 ). ( For a re v i ew of the analogous appro a c h in mice, see the article in this issue on q u a n t i t a t i ve trait locus [QTL] analysis by Grisel,.) The primary a d vantage of linkage mapping is that i n vestigators need no prior knowledge of the physiology or biology underlying the di s o rder being studied, which is import a n t for complex disorders, such as alcoholism.
W h e reas the linkage mapping a p p roach is an unbiased search of the e n t i re genome without any pre c o n c e ptions about the role of a certain gene, the candidate gene approach allow s re s e a rchers to investigate the validity of an "educated guess" about the genetic basis of a disord e r. This approach invo l ve s assessing the association between a particular allele (or set of alleles) of a gene that may be invo l ved in the disease ( i . e . , a candidate gene) and the disease i t s e l f. In other words, this type of association study tries to answer the question, " Is one allele of a candidate gene more f requently seen in subjects with the disease than in subjects without the disease?" The major difficulty with this a p p roach is that in order to choose a potential candidate gene, re s e a rc h e r s must already have an understanding of the mechanisms underlying the disease (i.e., disease pathophysiology). In contrast with linkage mapping studies, howe ve r, studies of candidate genes do not re q u i re large families with both affected and unaffected members, but can be p e rformed with unrelated cases and c o n t rol subjects or with small families (e.g., a pro b a n d and parents). Fu rt h e rm o re, candidate gene studies are better suited for detecting genes underlying common and more complex diseases w h e re the risk associated with any give n candidate gene is re l a t i vely small (Collins et al. 1997; Risch and Me r i k a n g a s 1996). This article describes the methods used in candidate gene studies, including associated methodological and technical considerations, and re v i ews examples of this approach that are both related and u n related to alcoholism. Because many of the best known examples of this a p p roach have been conducted in humans, these studies have been highlighted. Howe ve r, the overall appro a c h is similar in other model organisms, and the differences will be re v i ewed at the end of this article.

Selecting a Candidate Gene
The first critical step in conducting candidate gene studies is the choice of a suitable candidate gene that may plausibly play a re l e vant role in the process or disease under investigation. For example, when studying alcoholism, genes encodi n g enzymes that act in various pathways of alcohol metabolism, such as alcohol d e h yd rogenase (ADH) and aldehyd e d e h yd rogenase (ALDH), are logical choices. Both enzymes are encoded by m o re than one gene (i.e., by gene families), and each of these genes exists in s e veral variants, or alleles, allowing for its use in the candidate gene approach.
In alcoholism, as in other addictive d i s o rders, the pathways through which brain chemicals (i.e., neuro t r a n s m i t t e r s ) and other signaling molecules act may also play a role in the development and maintenance of addictive behaviors. The selection of p a rticular genes for f u rther analysis as candidate genes could be facilitated if some of the p o t e n t i a l l y i m p o rtant genes we re located in DNA regions that could be linked to alcoholism in genome screens.

Choosing a DNA Polymorphism
Once investigators have selected a candidate gene, they must decide which polymorphism would be most useful for testing in an association study. To this end, they must identify existing g e n e variants and determine which of those variants result in proteins with altere d functions that might influence the trait of intere s t . 1 ( For more information on the relationship betwe e n mutations in the DNA and variations in pro t e i n function, see the sidebar, p. 167.) In the case of ALDH, several we l l -k n ow n polymorphisms result in the substitution of certain protein building blocks (i.e., amino acids) and thus can lead to proteins with biologically re l e vant changes in function. In many cases, howe ve r, re s e a rchers may know a gene's DNA sequence but may not have any information about functional variation in the gene.
Detecting genetic variants is a laborious process that often invo l ve s sequencing-that is, determining the sequence of DNA building blocks (i.e., nucleotides)-for the entire gene in both affected and unaffected individuals to look for consistent differe n c e s . 2 A l t e r n a t i ve l y, re s e a rchers can employ s c re e n i n g p ro c e d u res during which they isolate small gene sections from many individuals and compare their mobility in a gelatinous material. Di f f e rences in mobility in these analyses may indicate nucleotide variations (Malhotra and Goldman 1999). 3 To confirm that a potential nucleotide variation exists and to determine its exact location in the genome, investigators then must conduct additional studies, typically on the d i rect sequencing of the DNA section in question. This information also a l l ows re s e a rchers to determine whether the nucleotide variation is likely to have functional significance, either because it actually results in amino acid changes in the resulting protein or because it occurs in DNA re g i o n s c o n t rolling the gene's activity. Fi n a l l y, to be useful for candidate gene studies, the variant should occur with sufficient fre- Both the genes and the proteins they encode frequently are abbreviated with the same letters; however, the names of the genes are usually typed in italics and the names of the proteins in regular letters.

2
A typical gene can span 10,000 to 100,000 or more nucleotides of the human genome, of which approximately 2 to 5 percent (i.e., a few thousand nucleotides) consist of the coding sequence and the rest, intronic sequence.
quency to allow detection of differe n c e s b e t ween individuals with and without the trait under inve s t i g a t i o n . Not all genes, howe ve r, have an easily identifiable common functional variant that can be exploited in association studies, and in many cases re s e a rc h e r s h a ve identified only changes in individual nucleotides (i.e., single nucleotide polymorphisms [SNPs]) that have no k n own functional significance. Ne ve rtheless, SNPs can be potentially useful in narrowing a linkage region. In addition, they may show a statistically significant association with a disease susceptibility gene if they are located within or near that gene by virtue of linkage disequilibrium (see the sidebar for a description of this phenomenon).
S N Ps can be of particular benefit in studies of complex disorders for which many potential candidate genes exist. For example, linkage mapping studies h a ve suggested several genomic are a s that may contain susceptibility genes for alcoholism. Each of these areas, howe ve r, is so large that it may contain doze n s or hundreds of genes depending on the s i ze and gene density of each re g i o n . 4 Because it would be pro h i b i t i vely difficult to sequence all these genes, publicly a vailable SNP data are a great re s o u rc e for candidate gene and association studies. For example, re s e a rchers re c e n t l y a n a l y zed several SNPs in the DNA region containing a candidate gene for A l z h e i m e r's disease (AD) and demonstrated that two SNPs closely flanking that gene indeed showed strong association with AD (Ma rtin et al. 2000). (Fo r m o re information on this candidate gene for AD, see the section "Ex a m p l e s of the Candidate Gene Ap p roach in Humans.")

Testing the Candidate Gene
Once investigators have chosen a candidate gene and suitable p o l y m o r p h i s m , they commonly test the role of this gene in a sample of randomly chosen subjects with the disease (i.e., cases) and without the disease (i.e., contro l s ) . Such subject groups are re l a t i vely easy to obtain, giving the candidate gene a p p roach an important advantage ove r the linkage mapping approach, which re q u i res the analysis of families with multiple affected members. Ad d i t i o n a l a d vantages of the case-control study design over linkage-based methods include the following (Malhotra and Goldman 1999): • Re s e a rchers can more easily obtain large numbers of cases and contro l s u b j e c t s .
• The effect of disease hetero g e n e i t y (i.e., that a disease may have multiple genetic causes despite a similar disease phenotype) is less pro b l e m a t i c .
• Re s e a rchers do not need to make assumptions about the exact mode of disease transmission before conducting their analyses.
The major problem associated with the case-control design is that it may result in spurious associations if the c o n t rols are not appropriately matched to the cases with respect to ethnicity or other factors that influence an individu a l's genetic composition.

Examples of the Candidate Gene Approach in Humans
A widely cited example of the usefulness of the candidate gene appro a c h i n vo l ves AD, the most common cause of dementia in the elderly. AD typically is a late-onset disorder (i.e., the earliest symptoms occur after age 60) with a complex inheritance pattern. The disease often appears to occur sporadically, e ven when there is an underlying genetic p redisposition. One of the pathologic h a l l m a rks of AD is the presence of m i c roscopic aggregates, or plaques, of a small protein-like molecule called βamyloid peptide. These βa m y l o i d plaques also contain several other proteins, including one called apolipopro-tein E (ApoE), whose gene (A P O E) is located on chromosome 19. ApoE was implicated in the development of AD by findings that it binds tightly to βamyloid in the fluid surrounding the brain and spinal cord (i.e., the cereb rospinal fluid) (Strittmatter et al. 1993). Fu rt h e r m o re, prior linkage data had indicated that a gene for late-onset AD was located on chromosome 19 (Pe r i c a k - Vance et al. 1991), in a region that included the A P O E gene. Based on these findings, re s e a rchers conducted an association study comparing the frequency of three A P O E alleles called E2, E3, and E4 in 30 cases and 91 unre l a t e d c o n t rols (Strittmatter et al. 1993). The i n vestigators found that whereas all alleles o c c u r red in the controls, the A P O E * E 4 allele was greatly ove r re p resented in the AD cases, indicating that this allele is a major risk factor for the deve l o p m e n t of AD. This robust association betwe e n A P O E * E 4 and AD has been confirmed in many subsequent studies (for a re v i ew, see St. Ge o r g e -Hyslop 2000).
With respect to alcoholism, re s e a rc h e r s h a ve used the candidate gene appro a c h to investigate the association betwe e n c e rtain A D H and A L D H alleles and an a l t e red risk of alcoholism. Studies have found that the enzyme encoded by an A L D H allele called A L D H 2 * 2 d e g r a d e s a c e t a l d e h yde more slowly than normal, resulting in the prolongation of cert a i n unpleasant alcohol effects, such as facial flushing, racing of the heart (i.e., palpitations), and nausea. Not surprisingly, this allele appears to have a pro t e c t i ve effect against alcoholism-that is, people carrying the allele are less likely to consume alcohol and to develop alcoholism (for a re v i ew, see Reich et al. 1999). The frequency of the A L D H 2 * 2 allele is particularly high in some Asian populations, and carriers of this allele consume less alcohol and are much less likely to develop alcoholism than are people without the allele.

The Candidate Gene Approach in Mouse Studies
Qu a n t i t a t i ve trait loci (QTLs) are DNA regions that may contain one or more genes related to the development of a What Is Moderate Drinking? The Candidate Gene Approach DNA, the genetic material contained in each cell, encodes the information for all the proteins needed to c reate and maintain an organism. The information for each protein is contained within one gene. Genes re p re s e n t only a small portion of a cell's entire DNA (i.e., the genome), howe ve r, and stretches of DNA both betwe e n and within genes are not conve rted into proteins. So m e of these "n o n c o d i n g" DNA stretches (e.g., pro m o t e r s ) regulate the activity of the genes and determine which gene is turned "o n" or "o f f" in a given cell at a give n time. This regulation is necessary, because not all c e l l s need to generate all proteins at all times, and exc e s s i ve o r untimely protein production can lead to disease. Fo r example, only blood cells need to produce the pro t e i n hemoglobin, which carries oxygen from the lungs to the tissues. Noncoding DNA stretches within genes are called i n t rons. They are cut (i.e., spliced) out of an intermedia ry molecule called messenger RNA during the conve rsion of the genetic information in the DNA into a protein. This splicing process must be highly accurate in o rder to ensure that the resulting protein is functional.
DNA is a long, thread-like molecule whose building blocks-the nucleotides-consist of sugar molecules linked to organic bases. There are four such bases: adenine (A), guanine (G), cytosine (C), and thymine (T). The order, or sequence, in which the nucleotides are arranged specifies the order in which the building blocks of the resulting proteins (i.e., the amino acids) are combined. Because there are 20 amino acids but only 4 different nucleotides, a triplet of three nucleotides (i.e., a codon) represents one specific amino acid. However, the 4 nucleotides can be arranged into 64 different triplets, far more than the 20 codons needed to represent each amino acid. As a result, the genetic code is redundant, which means that more than one codon can represent a particular amino acid. For example, only the codon ATG re p resents the amino acid methioni n e; however, four different codons (GCA, GCC, GCG, and GCT) represent the amino acid alanine.
Both during the DNA duplication that occurs when cells divide and as the result of external factors (e.g., exposure to radiation or certain chemicals), changes in the nucleotide sequence (i.e., mutations) can occur. If these changes result in altered proteins that contribute to the development of a different phenotype, they represent polymorphisms that can be useful in candidate gene studies. Many mutations do not result in amino acid changes, however, and therefore do not alter the resulting protein or its function. For example, because of the redundant nature of the DNA code, some mutations result in codons that still specify the same amino acid. Thus, if a mutation occurred in the last nucleotide of the GCA triplet, all three possible new triplets (GCC, GCG, and GCT) would still e n c o d e the amino acid alanine. Ne ve rt h e l e s s, these single nucleotide polymorphisms (SNPs) can be useful in linkage studies, as described in the main article.
Furthermore, many mutations occur in noncoding DNA regions and therefore do not result in prot e i n variants that are associated with an altered phenotype or increased disease risk. Under two conditions, however, even mutations in noncoding regions might result in an altered phenotype and therefore be useful in candidate gene studies. First, mutations that occur in regulatory regions, such as promoters or intron splice sites, could alter gene activity and, consequently, the phenotype determined by that gene. Second, noncoding mutations that occur in an intron or outside a gene could be associated with an altered phenotype if they are positioned close to (i.e., typically within 200,000 nucleotides) a functional mutation and are therefore almost always inherited together with the funct i o n a l mutation. This phenomenon is known as "linkage disequilibrium." In all other cases, an observed association between a noncoding mutation and a disease may be a consequence of population stratification-that is, general differences between cases and controls if both subject groups are drawn from different underlying populations (e.g., ethnic groups or animal strains)or a chance event.

Genes and Mutations
c e rtain quantitative trait. Mapping of QTLs in animal models of alcoholrelated phenotypes has identified multiple genomic areas that potentially contain candidate genes for these phenotypes. (For more information on QTL mapping, see the article in this issue by Grisel,.) The methods of identifying these candidate genes and any potential functional va r iants are essentially the same as those used in humans. Once functional va r iants are found, howe ve r, any positive association between a variant and the trait of interest must be interpre t e d with caution. For example, because of the way mouse strains are bred, mice who have a trait (analogous to human cases) and mice who do not have that trait (analogous to controls) may possess different alleles at a particular gene e ven if that gene is unrelated to the disease (or trait) under consideration. The gene that actually confers the risk for the disease or trait under inve s t i g a t i o n may be located near the gene show i n g the allelic polymorphism, but may be difficult to identify positively using association methods alone.

Conclusion
A combination of linkage mapping and a candidate gene approach has been the most successful method of identifying disease genes to date. The candidate gene a p p roach is useful for quickly determini n g the association of a genetic va r i a n t with a disorder and for identifying genes of modest effect. This appro a c h has certain advantages over traditional linkage mapping or positional cloning a p p roaches. The current methods for e valuating risk associated with candidate genes complement traditional linkage e f f o rts in identifying susceptibility genes for alcoholism. As more SNPs are identified throughout the genome, some of those SNPs also will be located within candidate genes, there by allow i n g re s e a rc h e r s the use of the candidate gene a p p roach on a genome-wide scale. s